On Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces

نویسندگان

  • Arnab Bhattacharya
  • Purushottam Kar
  • Manjish Pal
چکیده

Statistical distance measures have found wide applicability in information retrieval tasks that typically involve high dimensional datasets. In order to reduce the storage space and ensure efficient performance of queries, dimensionality reduction while preserving the inter-point similarity is highly desirable. In this paper, we investigate various statistical distance measures from the point of view of discovering low distortion embeddings into low-dimensional spaces. More specifically, we consider the Mahalanobis distance measure, the Bhattacharyya class of divergences and the KullbackLeibler divergence. We present a dimensionality reduction method based on the Johnson-Lindenstrauss Lemma for the Mahalanobis measure that achieves arbitrarily low distortion. By using the Johnson-Lindenstrauss Lemma again, we further demonstrate that the Bhattacharyya distance admits dimensionality reduction with arbitrarily low additive error. We also examine the question of embeddability into metric spaces for these distance measures due to the availability of efficient indexing schemes on metric spaces. We provide explicit constructions of point sets under the Bhattacharyya and the Kullback-Leibler divergences whose embeddings into any metric space incur arbitrarily large distortions. We show that the lower bound presented for Bhattacharyya distance is nearly tight by providing an embedding that approaches the lower bound for relatively small dimensional datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On low dimensional local embeddings

We study the problem of embedding metric spaces into low dimensional Lp spaces while faithfully preserving distances from each point to its k nearest neighbors. We show that any metric space can be embedded into L p log2 k) p with k-local distortion of O((log k)/p). We also show that any ultrametric can be embedded into L k)/ 3 p with k-local distortion 1 + . Our embedding results have immediat...

متن کامل

Computational metric embeddings

We study the problem of computing a low-distortion embedding between two metric spaces. More precisely given an input metric space M we are interested in computing in polynomial time an embedding into a host space M ′ with minimum multiplicative distortion. This problem arises naturally in many applications, including geometric optimization, visualization, multi-dimensional scaling, network spa...

متن کامل

Spanners with Slack

Given a metric (V, d), a spanner is a sparse graph whose shortest-path metric approximates the distance d to within a small multiplicative distortion. In this paper, we study the problem of spanners with slack : e.g., can we find sparse spanners where we are allowed to incur an arbitrarily large distortion on a small constant fraction of the distances, but are then required to incur only a cons...

متن کامل

Low dimensional embeddings of ultrametrics

In this note we show that every n-point ultrametric embeds with constant distortion in l O(logn) p for every ∞ ≥ p ≥ 1. More precisely, we consider a special type of ultrametric with hierarchical structure called a k-hierarchically well-separated tree (k-HST). We show that any k-HST can be embedded with distortion at most 1 + O(1/k) in l O(k2 logn) p . These facts have implications to embedding...

متن کامل

Random Feature Maps for Dot Product Kernels

Approximating non-linear kernels using feature maps has gained a lot of interest in recent years due to applications in reducing training and testing times of SVM classifiers and other kernel based learning algorithms. We extend this line of work and present low distortion embeddings for dot product kernels into linear Euclidean spaces. We base our results on a classical result in harmonic anal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009